Back

BMC Medical Genomics

Springer Science and Business Media LLC

Preprints posted in the last 90 days, ranked by how well they match BMC Medical Genomics's content profile, based on 12 papers previously published here. The average preprint has a 0.05% match score for this journal, so anything above that is already an above-average fit.

1
Integrative Multi-Omics Analysis Reveals Novel Molecular Signatures, Disease Stratification and Therapeutic Opportunities in Primary Ciliary Dyskinesia: First AI-ML empowered platform towards precision medicine targeting human ciliopathies

Jitender, ; Hossain, M. W.; Mohanty, S.; Kateriya, S.

2026-01-14 health informatics 10.64898/2026.01.12.26343910
Top 0.1%
150× avg
Show abstract

Primary ciliary dyskinesia (PCD) belongs to the group of rare genetic disorders that is extremely hard to diagnose and treat. Current diagnostic modalities detect only 70% of cases and are technically demanding. It necessitates novel computational approaches for biomarker discovery and the identification of therapeutic targets. We have developed an integrative computational pipeline analysing transcriptomic data from 6 PCD patients and 9 healthy controls. We identified 1,249 differentially expressed genes (false discovery rate below 0.05, absolute log2 fold-change exceeding 1), revealing oxidative stress as a central pathophysiological mechanism, with glutathione S-transferase theta 2B (GSTT2B) emerging as a master regulatory hub. WGCNA detected 12 co-expression modules with three significantly disease-associated modules. The application of machine learning enabled outstanding diagnostic performance with a minimal 10-gene signature, maintaining an accuracy of 0.93. The Random Forest area under the receiver operating characteristic curve was estimated to be 0.96 {+/-} 0.03. This study aided in analyzing uncharacterized genes, such as FRMPD3, C1orf194, and METTL26, which were not previously associated with PCD. The methodology adopted for drug repurposing helped in the identification of FDA-approved drugs, including N-acetylcysteine, metformin, and resveratrol. They appeared as top candidates for therapeutic intervention of PCD. The age-dependent classification revealed that 156 genes exhibited significant disease progression interactions. On the other hand, gender-associated classifications precisely identified 342 sex-specific responsive genes. BackgroundPrimary ciliary dyskinesia (PCD), is considered a rare genetic disorder that arises due to ciliary dysfunction. It causes severe respiratory illness including chronic infections, bronchiectasis, and morbidity. Although more than 50 PCD genes have been identified, the molecular mechanisms underlying PCD pathophysiology remain unclear. This obscurity leads to failed therapeutic interventions, highlighting the need for robust PCD-specific molecular characterization. MethodsThis study has incorporated an integrated computational analysis of transcriptomic data obtained from the GSE25186 dataset. This dataset encompasses nasal epithelial cells samples extracted from six and nine confirmed cases of PCD and healthy controls respectively. Different approaches were undertaken in this study. These included empirical Bayes moderated t statistics, weighted gene co-expression network analysis (WGCNA) with soft threshold {beta}=6, comprehensive pathway enrichment across KEGG, Reactome, and GO databases, machine learning classification using Random Forest and Support Vector Machines, temporal trajectory inference through pseudotime analysis, and systematic drug repurposing screening against DrugBank v5.1.8 and ChEMBL v29 databases. ResultsWe identified 1,249 differentially expressed genes (adjusted p-value < 0.05, |log2FC| > 1), comprising 533 upregulated and 716 downregulated genes. The application of WGCNA identified 12 co-expression modules that were found to be associated with three different modules. These three modules were brown module: r = 0.78, p = 2x10-, blue module: r = - 0.65, p = 0.008, and green module: r = 0.82, p = 0.001). The machine learning tools yielded outstanding diagnostic performance, with a Random Forest AUC value of 0.96 {+/-} 0.03. This led to the generation of a minimal 10-gene diagnostic signature. This study identified N-acetylcysteine (NAC) as the top therapeutic candidate, with enhanced potential for treating PCD. The other candidates, metformin and resveratrol, had composite scores of 1.85 and 0.28, respectively, whereas NAC possessed a composite score of 2.46. Systems biology-based classification by age revealed progressive molecular deterioration. A total of 156 genes had a significant age x disease interaction, with a false detection rate of less than 0.05. Gender stratification located 342 genes that were differentially responsive, leading to the design of male/female-dependent therapeutic interventions. ConclusionsThe multi-omics analysis gives significant revelations onto PCD molecular pathophysiology. The oxidative stress (GSTT2B, GPX1, SOD2) mechanism and protein homeostasis disruption (HSPA8, PDIA3, CALR) served as central regulators for disease progression. This study helps to gain novel insights into reliable diagnostic markers, FDA-approved and readily available drug candidates for PCDs therapeutic interventions. Further, age and gender associated classification of biological markers in PCD offers novel path for tailored medicines. This study established a robust molecular framework for therapeutics of rare genetic diseases.

2
Genetic Evidence for Opposing Associations of Psoriasis and Type 2 Diabetes with Inflammatory Bowel Disease: A Mendelian Randomization Study

Orkild, M. R.; Dybdahl, K. L.; Duun Rohde, P. D.

2026-02-27 genetic and genomic medicine 10.64898/2026.02.25.26346967
Top 0.1%
45× avg
Show abstract

Inflammatory bowel disease (IBD) frequently co-occurs with immune-mediated and metabolic disorders, but whether these associations reflect shared genetics or causal effects remains unclear. We performed two-sample Mendelian randomization (MR) using large-scale genome-wide association study (GWAS) summary statistics to investigate potential causal effects of immune-mediated diseases and lifestyle traits on IBD, Crohns disease (CD), and ulcerative colitis (UC). SNP-based heritability and genetic correlations were estimated to contextualize findings. Following false discovery rate correction, genetically predicted psoriasis was positively associated with IBD (OR 1.15), CD (OR 1.23), and UC (OR 1.10), with the strongest effect observed for CD. Genetically predicted type 2 diabetes mellitus (T2DM) showed a modest inverse association with UC (OR 0.88). No lifestyle-related traits remained significant after correction. Sensitivity analyses indicated heterogeneity across instruments and evidence of directional pleiotropy in selected models, whereas no pleiotropy was detected for the T2DM-UC association. These findings support a role of psoriasis-related immune pathways in IBD susceptibility and suggest a potential inverse association between genetic liability to T2DM and UC.

3
Shared genetic factors between lung function and asthma by age at onset

Li, Y.; Cornejo-Sanchez, D. M.; Dong, R.; Naderi, E.; Wang, G. T.; Leal, S. M.; DeWan, A. T.

2026-02-26 genetic and genomic medicine 10.64898/2026.02.20.26346655
Top 0.1%
43× avg
Show abstract

The genetic relationship between asthma and lung function may be dependent on age-at-onset (AAO) of asthma. We investigated whether the shared genetics between asthma AAO and lung function is dependent on AAO. Asthma cases from UK Biobank were subset according to their AAO and genetic correlation was used to obtain genetically homogeneous groups, i.e., [&le;]20 (LT20), 20-40, and >40 (GT40) years. Association analysis and fine-mapping were performed to identify shared genetics between AAO groups and lung function. Mediation and quantitative trait locus (QTL) analyses were performed to identify mechanisms underlying shared genetic associations. Chr5, chr6, chr12, and chr17 each had one region that displayed a cross-phenotype replicated association with at least one AAO group and lung function. Overlapping credible sets obtained from fine-mapping were observed on chr5 and chr6. Mediation analyses demonstrated that for each region the proportion mediated through asthma on lung function was larger for asthma LT20 compared to 20-40 and GT40 suggesting that their effects on lung function were more strongly driven by this association. Tissue-specific QTL analysis revealed shared etiology on chr5 may be acting through SLC22A5 and C5orf56 which might play an important role in decreased lung function among individuals with earlier-onset asthma.

4
Benchmarking HLA genotyping from whole-genome sequencing across multiple sequencing technologies

Cremin, C.; Elavalli, S.; Paulin, L.; Arres Reche, J.; Saad, A. A. Y. A.; Attia, A.; Minas, C.; Aldhuhoori, F.; Katagi, G.; Wu, H.; Sidahmed, H.; Mafofo, J.; Soliman, O.; Behl, S.; Pariyachery, S.; Gupta, V.; Ghanem, D.; Sajjad, H.; Cardoso, T.; El-Khani, A.; Al Marzooqi, F.; Magalhaes, T.; Sedlazeck, F. J.; Quilez, J.

2026-02-12 health informatics 10.64898/2026.02.10.26345621
Top 0.1%
38× avg
Show abstract

BackgroundThe hyperpolymorphic nature and structural complexity of the human leukocyte antigen (HLA) genomic region present challenges for accurate and scalable typing across diverse sample types. While wholegenome sequencing (WGS) offers the opportunity to infer HLA genotypes without targeted enrichment, systematic benchmarks across sequencing platforms, biospecimens and coverage levels remain limited. ResultsWe assembled a multi-platform resource of WGS datasets derived from short-read (Illumina, MGI) and long-read (Oxford Nanopore Technologies R9 and R10) sequencing, spanning 29 biospecimens including cell lines, blood, buccal swab and saliva. We evaluated the performance of the HLA caller HLA*LA across 13 HLA genes, using a clinically validated assay as reference. WGSbased HLA genotyping achieved [~]95% accuracy across sequencing platforms, with Class I loci exhibiting higher accuracy than Class II. Crossplatform concordance was high, and performance remained consistent across Illumina, MGI and Oxford Nanopore chemistries. Analysis of blood, buccal swab and saliva samples showed that blood and buccal swabs supported accurate HLA inference, whereas saliva yielded reduced concordance. Downsampling experiments demonstrated that 15x coverage was sufficient to retain >95% accuracy at twofield resolution, with lower depths supporting lower-resolution typing. ConclusionsOur results demonstrate that WGS provides a robust, platformagnostic framework for accurate HLA genotyping across sample types and coverage levels. These benchmarks establish practical conditions for reliable HLA inference and underscore the utility of WGS for populationscale HLA analyses and future clinical applications.

5
Cardiomyopathy-Associated Mutations in a Hotspot Region at the C-terminal Part of Desmin Coil-2 Domain Impair the Intermediate Filament Assembly

Reckmann, J.; Milting, H.; Voss, S.; Radukic, M. T.; Klag, F. I.; Flottmann, F.; Lütkemeyer, A.; Gross, J.; Gaertner, A.; Landwehr, S.; Anselmetti, D.; Hoyer, A.; Müller, K. M.; Gummert, J.; Walhorn, V.; Brodehl, A.

2025-12-17 cardiovascular medicine 10.64898/2025.12.15.25342325
Top 0.2%
34× avg
Show abstract

BackgroundThe DES gene encodes the intermediate filament protein desmin, which connects different multi-protein complexes like the cardiac desmosomes and is highly important for the structural integrity of cardiomyocytes. Pathogenic DES-mutations cause filament assembly defects leading to cardiomyopathies. However, most DES-variants listed in genetic disease databases are currently classified as variants of unknown significance. Here, we functionally characterized 21 different DES-variants of unknown significance and 18 additional proline variants, localized in a highly conserved stretch at the C-terminus of the desmin coil-2 subdomain. MethodsWe inserted desmin variants via site-directed-mutagenesis and investigated the filament assembly in transfected cell lines and cardiomyocytes derived from induced pluripotent stem cells by confocal microscopy. In addition, we purified recombinant wild-type and mutant desmin and analyzed the filament formation by atomic force microscopy. Co-expression with wild-type desmin delivered by adeno-associated virus was used to model the heterozygous status of cardiomyopathy patients. ResultsTwelve DES-variants of unknown significance formed cytoplasmic aggregates, which was likewise verified by atomic force microscopy. Of note, these twelve variants disturb the filament assembly even when co-expressed with wild-type desmin. Using a proline screen, we showed that proline residues localized at nearly each of the positions in this stretch cause filament assembly defects. By modelling the tetrameric structure of desmin, we demonstrated that specific heptad positions as well as positions of intra- and intermolecular ion bridge sites are particularly susceptible mutations that promote desmin aggregation. ConclusionIn summary, our study demonstrated that the highly conserved stretch at the C-terminus of the coil-2 subdomain is a hotspot region, where several pathogenic DES-mutations cause an aberrant desmin aggregation. Based on our functional data we suggest to re-classify the aggregate-forming variants as likely-pathogenic mutations rather than variants of unknown significance. Our study may have relevance for the genetic counselling of cardiomyopathy patients with similar DES-variants.

6
ML-Guided GWAS Reveals Genetic Architectures for MASLD for Overweight and Lean Individuals in the All of Us Cohort

Nambiar, A.; Karambelkar, K.; Athreya, A.; Allen, A. M.; Lazaridis, K. N.; Donovan, S. M.; Maslov, S.

2025-12-20 health informatics 10.64898/2025.12.18.25342567
Top 0.2%
33× avg
Show abstract

Metabolic dysfunction-associated steatotic liver disease (MASLD) arises from excessive hepatic fat accumulation that triggers inflammation and liver injury. It is the most prevalent chronic liver disease worldwide, affecting more than one quarter of adults. Despite this, MASLD is often underdiagnosed, making it more difficult to perform genome-wide association studies (GWAS). In this paper, we implemented a machine learning (ML)-guided GWAS framework to identify genetic risk factors for MASLD across lean and overweight individuals in the All of Us Research Program. A random forest model trained on laboratory measurements, vital signs, and demographic features generated an in silico MASLD (I-MASLD) score, a continuous risk score for MASLD, which was validated to accurately represent clinical MASLD diagnosis. This score was then used as the phenotype in a GWAS of whole-exome sequencing variants. The resultant GWAS discovered a novel variant in the ANGPTL4 gene to be significantly associated with MASLD risk and recapitulated known variants in various genes involved in lipid metabolism and insulin signaling. Our results also suggest a potential role of APOA5 in MASLD onset or progression in lean patients. These findings demonstrate that ML-derived quantitative phenotypes can enhance genetic discovery in large, heterogeneous cohorts where conventional case/control labels are limited or imprecise.

7
Transcriptomic Profiling of Peripheral Blood Mononuclear Cells Reveals Key Molecular Signatures in Chronic Kidney Disease Patients with Heart Failure

Shafreen, M.; Chakraborty, M.; Patil, L.; Navamani, S.; Shema, E.; Pujari, D.; More, S.; Satish, D.

2025-12-30 health informatics 10.64898/2025.12.29.25343179
Top 0.3%
26× avg
Show abstract

BackgroundHeart failure (HF) is a frequent and severe complication among patients with chronic kidney disease (CKD), particularly in advanced stages and end stage renal disease (ESRD). This study focuses on understanding the molecular interplay between CKD and HF beyond the context of maintenance hemodialysis (MHD). Given that peripheral blood mononuclear cells (PBMCs) reflect systemic inflammatory and transcriptional alterations, we analyzed PBMC transcriptomes to uncover potential biomarkers and mechanistic links connecting CKD and HF. MethodsPublicly available RNA Seq data comprising PBMCs from 15 CKD patients with HF (SRX23265333) and 14 healthy controls (SRX19031772) were analyzed. Quality control was performed using FastQC and Fastp, followed by alignment to the human reference genome with HISAT2. Gene counts were normalized, and differential expression was determined using DESeq2. Functional enrichment analyses (Gene Ontology and KEGG) identified key biological pathways. Protein protein interaction (PPI) networks were constructed using STRING, and hub genes were validated through disease and gene associations in the Comparative Toxicogenomics Database (CTD). ResultsDifferential expression analysis revealed several genes significantly dysregulated in CKD patients with HF compared to controls. Enrichment results highlighted processes associated with extracellular matrix remodeling, immune activation, and cardiac renal fibrosis. PPI analysis identified four major hub genes CCL2, ALB, EGFR, and COL1A2 as central nodes within the network. These genes are functionally linked to inflammatory signaling, vascular remodeling, and fibrotic progression, consistent with pathophysiological mechanisms of HF and CKD. CTD validation further confirmed their association with cardiorenal dysfunction. DiscussionThis integrative transcriptomic study identifies CCL2, ALB, EGFR, and COL1A2 as key PBMC expressed hub genes linking CKD and HF. The findings enhance understanding of the molecular basis of cardiorenal syndrome and propose candidate biomarkers and therapeutic targets for future translational research.

8
Whole Genome Sequencing Identifies Crucial Diagnostic Biomarkers and Therapeutic Targets in Premature Coronary Artery Disease in South Asians: A Pilot Study

Ali, Y.

2025-12-17 genetic and genomic medicine 10.64898/2025.12.17.25342163
Top 0.4%
25× avg
Show abstract

Background/ObjectivesCoronary artery disease (CAD) remains the leading cause of mortality worldwide, with South Asia bearing a disproportionately high and rising burden, particularly at younger ages. This pilot study aimed to investigate genetic variants associated with premature coronary artery disease (PCAD) using whole genome sequencing (WGS). MethodsWGS was conducted on 12 people (5 PCAD cases, 7 matched controls) to assess feasibility and methodology for future large-scale research. High-quality genomic DNA was sequenced at a minimum read depth of 10x with a quality threshold of Q30. Variant calling with stringent quality control identified single nucleotide polymorphisms (SNPs), followed by annotation against gnomAD for allele frequencies and ClinVar for pathogenicity. Protein-coding variants were filtered, and candidate genes were prioritized for comparative analysis between cases and controls. ResultsAn average of over 8.8 million SNPs per individual was identified, with comparable overall variant distributions between cases and controls. Initial analyses revealed 120 SNPs exclusively present in PCAD cases. All protein-coding variants were rare (allele frequency <0.0001), and none were previously classified as pathogenic in ClinVar. After filtration, 87 candidate genes were prioritized. Enriched or unique variants in PCAD cases are mapped to genes involved in lipid metabolism, endothelial dysfunction, inflammatory signaling, immune regulation, thrombosis, vascular remodeling, and metabolic processes. Additional variants were identified in genes related to smooth muscle proliferation, oxidative stress, and other biological pathways. ConclusionsThis WGS pilot study provides an initial overview of the genomic landscape of PCAD in a South Asian cohort, highlighting potential rare variants across multiple biological pathways implicated in atherosclerosis that needs validation in a large-scale study. Graphical Abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=113 SRC="FIGDIR/small/25342163v2_ufig1.gif" ALT="Figure 1"> View larger version (33K): org.highwire.dtl.DTLVardef@1429c57org.highwire.dtl.DTLVardef@264dd8org.highwire.dtl.DTLVardef@c0c6f6org.highwire.dtl.DTLVardef@46752d_HPS_FORMAT_FIGEXP M_FIG C_FIG

9
Multi-Omics Analysis of Genetic Drivers Linking Aortic Stenosis and Left Ventricular Diastolic Dysfunction in Heart Failure

Ahmed, Z.; Govindareddy, P.; Mathew, J.; Yanamala, N.; Sengupta, P. P.

2026-01-11 cardiovascular medicine 10.64898/2026.01.09.26343788
Top 0.6%
23× avg
Show abstract

BackgroundAortic stenosis (AS) and left ventricular diastolic dysfunction (LVDD) often coexist in heart failure (HF), but the mechanisms linking them remain unclear. While AS increases afterload and promotes myocardial stiffening, emerging AI-based evidence suggests LVDD can precede the development of AS or progress simultaneously, indicating shared upstream mechanobiological and inflammatory drivers. This study explores the genetic contributors connecting AS and LVDD to identify early molecular markers and convergent pathways in HF. MethodWe analyzed Whole Genome Sequence (WGS) and RNA-seq data of the HF patients, generated using their Peripheral Blood Mononuclear Cells (PBMCs) samples. Overall bioinformatics analysis was divided into two modules, 1) gene variant and annotation analysis, and 2) gene expression and enrichment analysis. We utilized our peer review published and open source WGS and RNA-seq pipelines to process Next-Generation Sequence (NGS) data. Furthermore, we performed bioinformatics and statistical analysis to identify genetic variations, expressions, regulation, enrichments, and disease annotations. ResultsWe identified genetic markers uniquely associated with AS, LVDD, and shared between them. Furthermore, we report genes with significant expression, and functional variations, and discuss their relationship with other cardiovascular diseases (e.g., Vascular and Cardiac Stiffness, Aortic Dissection, Left Atrial Enlargement, Left Ventricular Hypertrophy, Outflow Tract Obstructive Defects, Non-Compaction Coronary Artery Disease, Arrhythmia, Congestive Heart Failure, and Hypertrophic, Dilated, and Ischemic Cardiomyopathy) and non-cardiovascular diseases (non-CVDs) (e.g., Type 1 Diabetes, Diabetic Nephropathy, Skeletal Anomalies, Rheumatoid Arthritis, Atypical Femoral Fractures, Chronic Kidney Disease, Dehydrated Hereditary Stomatocytosis, Schizophrenia, Varicose Veins, High-Altitude Pulmonary Edema, Periodontitis, and Respiratory Disorder) including multiple cancer types (e.g., Breast, Lung, Colorectal, Pancreatic, Hypopharyngeal, Acute Lymphoblastic, and Oral Squamous Cell Carcinomas) and rare genetic disorders (e.g., Hypophosphatasia, Multiple Sclerosis, Campomelic Dysplasia, Lymphatic Malformation). We validated our results through state of science literature, gene-disease annotation databases, and electronic health records. ConclusionsAS and LVDD share both clinical and genomic associations, with overlapping genetic drivers that are enriched in pathways related to inflammation, extracellular matrix remodeling, and vascular stress responses. This work supports the potential of blood-based multi-omics profiling to uncover early, systemic molecular signals of cardiac dysfunction and lays the groundwork for future tissue-specific studies to guide precision diagnosis, risk stratification, and targeted therapeutics in HF.

10
Proteomics Reveal Clusters of Hypertension Cases Associated with Differing Prevalence of Cardiovascular and Renal Complications

Pehova, Y.; Apella, S.; Kolobkov, D.; Malinowski, A. R.; Pawlowski, M.; Strivens, M. A.; Sardell, J.; Gardner, S.

2026-03-04 cardiovascular medicine 10.64898/2026.03.03.26347534
Top 0.7%
16× avg
Show abstract

BackgroundHypertension affects over 30% of adults and is the leading risk factor for cardiovascular disease. It often presents without obvious symptoms, meaning that, although effective therapies exist, hypertension remains widely undiagnosed and insufficiently treated. Genomics-based prediction methods have shown only modest benefits for these disorders, but proteomic markers have demonstrated potential for greater predictive and clinical value. MethodsWe applied a novel machine-learning based patient stratification analysis pipeline to proteomics data for 7,086 hypertension patients from UK Biobanks Pharma Proteomics Project cohort (2,911 proteins). We evaluated the contribution of each protein to the output of a tree-based risk model to explore the combinations of protein expression values that naturally separate hypertension cases into clusters and assessed the prevalence of cardiovascular and renal complications within each obtained cluster. ResultsWe identified 10 clusters of hypertension patients segregated by differential expression of HAVCR1, PLAT, PTPRB, REN and RTN4R. Four of these clusters showed statistically significant enrichment for cardiovascular and renal complications, and three of them had significantly lower prevalence of complications than expected among hypertension patients. ConclusionWe hypothesize that the hypertension clusters identified may represent distinct mechanistic subtypes. With further study this could help focus studies on subgroups of hypertension patients with a shared disease etiology, identify more personalized precision medicine treatment options for each subgroup, and develop mechanism-based biomarker tests to support enriched clinical trial recruitment.

11
Melanocyte loss dominates the vitiligo transcriptome: a rank-based meta-analysis of six independent studies

Ge, X.

2026-02-09 dermatology 10.64898/2026.02.07.26345817
Top 0.8%
16× avg
Show abstract

Vitiligo is an autoimmune disorder characterized by melanocyte destruction. We performed a rank-based meta-analysis of six independent transcriptomic studies (115 samples) spanning microarray, bulk and single-cell RNA-seq platforms to identify consensus signatures of lesional skin. Robust Rank Aggregation identified 114 differentially expressed genes (FDR < 0.05) with striking asymmetry: 108 downregulated versus 6 upregulated. Downregulated genes were dominated by melanocyte markers (MLANA, TYRP1, DCT, PMEL, KIT). Upregulated genes included interferon-stimulated genes (OAS1, OAS2, EPSTI1). Pathway-level meta-analysis confirmed uniform suppression of melanogenesis, while immune activation was heterogeneous across datasets. Single-cell data from three included studies confirmed melanocyte depletion. The 108 downregulated genes showed exclusive expression in melanocytes. These include neural genes (PLP1, GPM6B, NRXN3), consistent with melanocytes neural crest origin. We also identified candidate melanocyte markers such as CYB561A3 and QPCT with high melanocyte specificity and consistent downregulation in vitiligo. These findings reveal a robust melanocyte loss signature in vitiligo detectable across all platforms, and study-dependent immune activation possibly influenced by sampling method and disease characteristics.

12
Machine Learning-Based Identification of Blood Biomarkers that Distinguish Precachectic and Cachectic Patients with Pancreatic Ductal Adenocarcinoma

Olumoyin, K. D.; Park, M. A.; Davis, E.; Permuth, J. B.; Rejniak, K. A.

2025-12-27 health informatics 10.64898/2025.12.23.25342866
Top 0.8%
16× avg
Show abstract

BackgroundIdentification of minimally invasive biomarkers of different stages of cachexia (Ca), and precachexia (PCa) in particular, might help clinicians in treating patients with pancreatic ductal adenocarcinoma (PDAC) at high risk of progressing to a more severe cachectic stage. In this work, we developed a machine-learning (ML) model optimized to blood biomarkers data that identifies precachectic and cachectic patients. MethodsBlood and clinical data was collected from treatment-naive patients with PDAC through the Florida Pancreas Collaborative (FPC), a multi-institutional cohort study and biobanking initiative. Blood was processed into serum and assayed for a total of 35 candidate biomarkers. Participants were classified as having noncachexia (NCa), precachexia, or cachexia according to modified criteria by Vigano and colleagues which consider unintentional weight loss and biochemical data. Using these data, we designed ML algorithms to: (i) pre-select predictive blood biomarker candidates using a combination of mutual information method together with the leave-one-feature-out (LOFO) feature importance approach; (ii) identify the minimal combination of predictive biomarkers using the forward feature selection method; (iii) determine the optimal classification hyperparameters for the support vector machine using a cross-validation technique; and (iv) adjust the decision-boundary threshold for imbalanced data using the Matthews correlation coefficient. Three ML-based binary predictors were designed to determine patients cachexia status: NCa vs. Ca; PCa vs. Ca; and PCa vs. NCa. ResultsThe biomarker levels from 184 patients (28 NCa, 53 PCa, and 103 Ca) were used in this study. The NCa vs. Ca predictor identified a set of 6 biomarkers and yielded area under the curve (AUC) of 0.835. The PCa vs. Ca predictor identified a set of 6 biomarkers and yielded AUC of 0.810. The PCa vs. NCa predictor identified a set of 5 biomarkers and yielded AUC of 0.771. ConclusionsThe developed ML models that use blood biomarker data provided effective predictions of patients cachexia stage that can help clinicians to diagnose PCa.

13
Deciphering the Genomic architecture of three major Cancers in African-Ancestry Populations

Enoma, D.; Idedia, A. M.; Ekenwaneze, C. C.; Dania, O. E.; Ogunlana, O. O.

2025-12-22 genetic and genomic medicine 10.64898/2025.12.19.25342629
Top 0.9%
15× avg
Show abstract

Genomic studies of cancer risk have disproportionately focused on populations of European ancestry, limiting biological insight and risk prediction in African-ancestry populations that experience a high burden of disease. Here, we analysed breast, colorectal, and prostate cancers in African-ancestry participants from the UK Biobank using ancestry-aware genome-wide association studies (GWAS), SNP-based heritability estimation, fine-mapping, transcriptome-wide association studies (TWAS), and polygenic risk scoring (PRS). SNP-based heritability analyses revealed a comparatively high point estimate of common-variant heritability for colorectal cancer risk in African-ancestry individuals, alongside more modest estimates for breast and prostate cancer. Five loci reached genome-wide significance (p < 5x10-{square}), including four colorectal cancer loci (notably rs111448231 in RYR2) and one novel breast cancer locus (rs78768133). Gene-based burden testing identified eight prostate cancer-associated genes (MRPL45, PSMD8, GGN, SPRED3, FAM98C, BCLAF1, MTFR2, and NELL2) with FDR-significant associations, clustering within biologically plausible chromosomal regions on chr19q13 and chr6q23. Transcriptome-wide association analysis identified CYTH2 (ENSG00000105443.13) as a significant gene for prostate cancer. Polygenic risk scores incorporating African-ancestry linkage disequilibrium demonstrated heterogeneous predictive performance across cancers, with modest discrimination for colorectal and breast cancer and substantially stronger performance for prostate cancer (AUC = 0.89). Together, these findings delineate ancestry-relevant cancer genetic architectures and demonstrate the importance of population-matched genomic approaches for equitable precision oncology.

14
Leveraging Explainable Temporal-Modelling Machine Learning to Identify Distinct Multimorbidity Trajectory Profiles in Acute Myocardial Infarction

Onoja, A.; Elomaa, K.; Whetton, A.; Geifman, N.

2026-01-16 health informatics 10.64898/2026.01.14.26344136
Top 0.9%
15× avg
Show abstract

IntroductionAcute myocardial infarction (AMI) remains a leading cause of mortality, with the coexistence of other conditions (i.e., multimorbidity) complicating management and outcomes. Currently, healthcare providers see major challenges in consideration of the patient with a multimorbid profile, especially as this is a progressive issue where the temporal evolution of diseases is complex in nature, with a profound impact on clinical outcomes. MethodsData on 12,701 AMI patients from the UK Biobank were selected for analysis from the cohort of 502,000 volunteers and then grouped into pre- (up to 1 year prior) and early (within 5 years) post-AMI periods. Using Dynamic Time Warping (DTW) clustering, sequences of ICD-10 diagnoses accumulated over time in the post-AMI period were used to cluster participants. Topic modelling of cluster-specific diagnoses informed thematic labels for these profiles (clusters) of AMI patients. Using data from pre-AMI, along with socio-demographic variables (age, IMD score, BMI, and sex), four predictive supervised models, namely, Logistic Regression, Random Forest, XGBoost, and CatBoost, were developed, with CatBoost achieving the highest accuracy for profile membership prediction. Model interpretability via SHapley Additive exPlanations (SHAP) identified key diagnostic categories that were driving profile assignments. Then, survival analyses compared SMART (Second Manifestations of Arterial Disease) risk scores across the profiles, adjusting for clinical covariates to evaluate adverse cardiovascular outcomes - death. Finally, Phenome-Wide Association Studies (PheWAS) were employed to link profile-specific diagnostic themes to underlying genetic mechanisms. ResultsUsing the above approaches, three multimorbidity profiles were identified in the post-AMI period: Acute cardio-renal-respiratory instability with chronic metabolic disease (ACUTE-CARD), Cardiometabolic disease with mixed arrhythmic-ischemic burden (CARDIOMIX), and Smoking-related cardiovascular disease with multimorbidity (SMO-CARD). CatBoost predicted profile membership with AUROC 0.77. Participants in the SMO-CARD cluster showed the highest rates of mortality, while ACUTE-CARD had the most favourable outcomes (SMART risk score = 11.2, and 6.8% CVD deaths). SMO-CARD displayed a broad range of cardiopulmonary and systemic associations. PheWAS revealed profile-specific genetic associations and pathway enrichments were consistent with clinical features; for example, cardiometabolic genes were associated with the CARDIOMIX cluster, and immune-related pathways were associated with SMO-CARD, supporting the biological plausibility of these profiles. ConclusionIntegrating temporal clustering with explainable machine learning reveals distinct multimorbidity patterns in AMI patients. This framework supports personalised risk stratification and outcome prediction in clinical care.

15
Integrated Cardiac and Circulating N-glycan Signatures Reflect Atrial Remodeling in Patients with Atrial Fibrillation

Yiu, C. H. K.; Cheeseman, J.; Elgood-Hunt, G.; Ma, C. S.; Banerjee, A.; Moreira, L. M.; Johnston, A. M.; Mehta, N.; Cox, K.; Betts, T. R.; Rajappan, K.; Ginks, M.; Pedersen, M.; Bashir, Y.; Wijesurendra, R. S.; Sayeed, R.; Krasopoulos, G.; Srivastava, V.; Kourliouros, A.; Spencer, D. I. R.; Reilly, S.

2026-02-01 cardiovascular medicine 10.64898/2026.01.29.26345171
Top 0.9%
15× avg
Show abstract

BackgroundAtrial adverse remodeling drives the maintenance and progression of atrial fibrillation (AF) through electrical and structural myocardial changes, often accompanied by inflammation. Circulating N-glycans are emerging as biomarkers in inflammatory diseases, yet their role in AF remains undefined. MethodsWe profiled the serum N-glycome of 138 patients with AF, non-AF arrhythmias, or sinus rhythm (SR) controls from peripheral venous (PV) and coronary sinus (CS) samples using hydrophilic interaction liquid chromatography coupled with high-resolution mass spectrometry. Glycan traits associated with AF were identified via logistic regression adjusted for clinical risk factors. Multivariate glycan scores were derived from PV and CS datasets using LASSO regression. In a subset (N=37), plasma proteome profiling was performed with the Olink Reveal panel. ResultsSixty-two glycan peaks were detected; 27 in PV and 8 in CS serum differed significantly between AF and controls. PV and CS glycan scores accurately classified AF, with the PV score correlating with 11 plasma proteins linked to structural remodeling and thrombo-inflammatory processes. The most abundant glycan, A2G2S2 (peak 30), was associated with higher odds of AF after adjusting for confounders (OR 2.22 [95% CI: 1.40-3.75], P = 0.001). CS A2G2S2 correlated with C-reactive protein (R = 0.432, P = 0.0275) and was elevated in patients with left atrial enlargement (P = 0.0354), but unchanged in those with impaired left ventricular ejection fraction or hypertrophy. ConclusionIntegrated profiling of peripheral and cardiac serum identifies novel N-glycosylation signatures in AF. Specific cardiac and circulating N-glycan signatures, including A2G2S2, are associated with AF and reflect inflammation-driven atrial remodeling, highlighting potential mechanistic pathways and biomarker applications.

16
Rates of Adherence to Colorectal Cancer Medications and Predictors of Non-Adherence: A Systematic Review and Meta-Analysis

Raj, R.; Abegaz, T. M.; Nechi, R. N.; Donneyong, M. M.

2026-01-08 health informatics 10.64898/2026.01.07.26343638
Top 0.9%
15× avg
Show abstract

PURPOSETo synthesize adherence rates to colorectal cancer medications and identify predictors of nonadherence. METHODSFollowing PRISMA, we searched PubMed, Embase, PsycINFO, and Web of Science through August 13, 2024. Observational studies reporting adherence or predictors were eligible. Two reviewers independently screened and extracted data and assessed risk of bias using JBIs Checklist for Cohort Studies. Adherence was grouped by measurement approach: claims-based PDC/MPR, chart/clinical record review, or patient-reported. Random-effects meta-analyses were performed within clinically and methodologically homogeneous subgroups. RESULTSThirteen studies (n = 13) met inclusion, with adherence ranging from 33% to 100%. In claims-based analyses using PDC/MPR thresholds, pooled adherence was about 40% [95% CI, 0.36-0.44] with substantial heterogeneity (I2 = 84.6%). Pooled adherence was about 83% in both chart/record [95% CI, 0.44-0.97] (I2 = 71.4%) or patient-reported measures [95% CI, 0.68-0.92] (I2 = 93.8%), also with substantial heterogeneity. Nonadherence was more likely with advanced stage, ECOG [&ge;]1, multiple prior regimens, female sex, and treatment-related adverse events. The overall risk of bias was low, although some included studies lacked complete follow-up or strategies to address it. CONCLUSIONWe synthesized adherence to CRC medications and identified consistent predictors of nonadherence. Adherence was lowest with claims-based PDC/MPR and higher with chart or patient-reported measures. These findings support targeted interventions for patients at higher risk of non-adherence, including those in the advanced stage of the disease, those with multiple regimens, and those experiencing adverse events. Future work should use standardized adherence definitions and metric-specific reporting to enable valid pooling.

17
Genome-Wide Association Study of Genetic Variants Associated with Lower Extremity Amputation Risk in Peripheral Artery Disease

korutla, r.; garg, t.; wilczek, m. p.; Ross, e. G.; amal, s.

2025-12-20 genetic and genomic medicine 10.64898/2025.12.17.25342479
Top 0.9%
15× avg
Show abstract

Peripheral artery disease (PAD) is a global health burden affecting over 200 million individuals and is frequently complicated by limb-threatening ischemia, leading to major amputations. Despite known clinical risk factors, the genetic basis underlying amputation risk in PAD remains poorly defined. In this study, we performed a multi-pronged genome-wide association study (GWAS) to identify genetic variants associated with lower extremity amputation in patients with PAD, using data from the All of Us Research Program. Two analytical strategies were employed: a targeted GWAS using ClinVar variants on the full cohort and a comprehensive genome-wide association study using Allele Count/Allele Frequency (ACAF) data on a balanced subset. The ClinVar analysis of 118,871 variants in 14,771 PAD patients (613 with amputation, 14,158 without) identified 3 suggestive associations with a genomic inflation factor of 1.046. The ACAF analysis of 7,784,837 quality-controlled variants in 804 balanced samples (399 cases, 405 controls) yielded 35 suggestive associations (p < 1x10{square}{square}) with a genomic inflation factor of 1.017. No variants achieved suggestive significance in both analyses. These findings highlight candidate loci for further validation and may inform future development of risk prediction tools and targeted interventions to reduce limb loss in PAD.

18
Genome-wide association studies to identify shared and distinct mechanisms of fibrosis across 12 organ-systems

Joof, E.; Hernandez-Beeftink, T.; Parcesepe, G.; Massen, G. M.; Nabunje, R.; Power, H. J.; Woodward, R.; Altunusi, F.; Leavy, O. C.; Longhurst, H. J.; Jenkins, R. G.; Quint, J. K.; Wain, L. V.; Allen, R. J.

2026-02-19 genetic and genomic medicine 10.64898/2026.02.18.26346458
Top 1%
15× avg
Show abstract

IntroductionFibrosis can affect organs throughout the body and is present in a wide range of diseases. Recent research has suggested that there could be shared biological mechanisms that lead to fibrosis in different organs. MethodsWe performed genome-wide association studies using UK Biobank for fibrosis in 12 different organ-systems and meta-analysed results with previously published studies of fibrotic diseases. We considered genetic associations that colocalised across [&ge;]3 organs as those likely to be involved in general fibrotic mechanisms and also identified novel genetic variants not previously reported as associated with fibrosis. Genetic correlation of fibrosis between organs was calculated using linkage disequilibrium score regression (LDSC). Discovery analyses were performed using European ancestry individuals and results were tested further in African, South Asian and East Asian ancestry groups. ResultsWe identified eight genetic loci that colocalised across three or more organs. One of these signals, located near the SH2B3 and ATXN2 genes, showed evidence of a shared causal variant for fibrosis across five organs. We also identified two novel fibrotic associations, one implicating alternative splicing of TFCP2L1 for urinary fibrosis and another implicating a missense variant in FAM180A for intestinal-pancreatic fibrosis. We observed significant genetic correlations for all organs, particularly for liver and skeletal fibrosis. ConclusionWe found evidence of shared genetic associations for fibrosis across organs, both at individual genetic loci and genome-wide. This highlights specific genes that may contribute to fibrosis across organs and diseases, which may facilitate the development of new therapies.

19
Masticatory Performance and Mortality in Individuals with Cardiovascular Diseases: A Population-Based Prospective Study

zheng, z.; Wu, J.; Li, Q.; Qin, X.

2026-01-04 cardiovascular medicine 10.64898/2026.01.02.26343366
Top 1%
14× avg
Show abstract

BackgroundThe relationship between functional tooth units (FTU) and cardiovascular diseases (CVD) risk remains understudied. This study is to investigate the association between masticatory performance, measured by FTU, and CVD incidence and CVH, and mortality among individuals with CVD using data from the National Health and Nutrition Examination Survey (NHANES). MethodsMasticatory function was measured by FTU, CVD was determined by the self-reported questionnaire, CVH was assessed using the American Heart Associations Lifes Simple 7 (LS7), Lifes Essential 8 (LE8), and Lifes Crucial 9 (LC9). CVD mortality was determined from the National Death Index. Weighted logistic, linear regression and cox proportional hazards models were used to evaluate the associations between FTU and CVD incidence, CVH and mortality. ResultsHigher FTU was associated with a reduced risk of CVD. Participants with FTU scores of 9-12 had a 46% reduced risk of CVD compared to those with FTU scores of 0-3. FTU was also positively correlated with better CVH scores in LS7, LE8 and LC9. In the mortality analysis, those with FTU of 9-12 was associated with a 42%, 39% and 71%% reduction in all-cause, CVD- and cancer related mortality. The other masticatory function parameters and sensitivity analyses confirmed these relationships. ConclusionPreservation of masticatory function, measured by FTU, is associated with decreased CVD prevalence, improved CVH and reduced mortality in individuals with CVD. These findings suggest that the maintenance of optimal masticatory function may play a protective role in reducing CVD incidence and mortality. Clinical relevanceO_ST_ABSWhat is New?C_ST_ABSThis study provides novel insights into the association between masticatory function, assessed by functional tooth units (FTU), and cardiovascular disease (CVD) outcomes in individuals with CVD. It demonstrates that higher FTU scores are linked to a significantly lower risk of CVD, improved cardiovascular health (CVH) scores, and reduced mortality, including CVD- and cancer-related deaths. Importantly, this is one of the first studies to show the potential role of masticatory function as a modifiable factor in cardiovascular health and mortality reduction in a large, nationally representative cohort. What Are the Clinical Implications?The findings highlight the importance of maintaining masticatory function as part of a comprehensive approach to CVD prevention and management. Clinicians should consider the role of oral health, specifically masticatory function, in improving cardiovascular health outcomes and reducing mortality risk in CVD patients. These results suggest that enhancing masticatory function may be a simple, yet effective intervention for reducing CVD incidence and mortality, emphasizing the need for oral health preservation in clinical practice, particularly among individuals with cardiovascular disease.

20
Massively parallel functional profiling identifies CCDC88C as a risk gene for ER-positive breast cancer

Mackie, K.; Kemp, H.; Gunnell, A.; Studd, J. B.; Went, M.; Law, P.; Tomczyk, K.; Sevgi, S.; Lu, Y.; Orr, N.; Houlston, R. S.; Johnson, N.; Fletcher, O.; Haider, S.

2026-03-03 genetic and genomic medicine 10.64898/2026.03.02.26347419
Top 1%
13× avg
Show abstract

Genome wide association studies (GWAS), combined with fine-mapping have identified 196 independent signals associated with breast cancer risk. Deciphering the functional basis of these associations can inform our understanding of the biology and aetiology of breast cancer. Decoding GWAS risk associations is challenging due to linkage disequilibrium between variants and because most variants map to non-coding regions, influencing breast cancer risk via cis-regulatory mechanisms that modulate the expression of target genes. To identify the functional variants driving breast cancer risk associations, we carried out a lentivirus-based massively parallel reporter assay (lentiMPRA) to screen 5,116 credible causal variants across these signals. We identified 709 variants mapping to 140 risk regions, that are associated with significant variation between REF and ALT alleles. A follow-up investigation at 14q32.11 revealed rs7153397 may impact expression of CCDC88C to influence both breast cancer risk and prognosis. These findings provide a prioritised set of functional variants for downstream analyses, advancing our understanding of breast cancer risk mechanisms.